Variable Speech Rate Mandarin Chinese Text-to-Speech System

نویسندگان

  • Chen-Yu Chiang
  • Qi-Quan Huang
  • Yih-Ru Wang
  • Hsiu-Min Yu
  • Sin-Horng Chen
چکیده

This paper presents an Hidden Markov Model (HMM)-based variable speech rate Mandarin Chinese text-to-speech (TTS) system. In this system, parameters of spectrum, fundametal frequency and state duration are generated by a context dependent HMM (CDHMM) whose model parameters are linear-interpolated from those of three CDHMMs trained by corpora in three different speech rates (SRs), i.e. fast, medium and slow. In addition, three decision tree (DT)-based pause break predictors trained by using the three SR corpora are used to interpolate the probabilities for inserting pause breaks. The performance of the proposed TTS system were evaluated by several objective and subjective tests. Experimental results suggested that coherence between interpolation weights for CDHMMs and DT-based pasue predictors is crutial for naturalness of the synthesis speech in variable SR. We believe that the proposed variable speech rate Mandarin Chinese TTS system is more suitable than conventional fixed SR TTS systems for applications of human-machine interaction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Is 3x Faster than Typing for English and Mandarin Text Entry on Mobile Devices

With laptops and desktops, the dominant method of text entry is the full-size keyboard; now with the ubiquity of mobile devices like smartphones, two new widely used methods have emerged: miniature touch screen keyboards and speech-based dictation. It is currently unknown how these two modern methods compare. We therefore evaluated the text entry performance of both methods in English and in Ma...

متن کامل

A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese

This paper presents a set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. A large speech corpus produced by a single speaker is used, and the speech output is synthesized from waveform units of variable lengths, with desired linguistic properties, retrieved from this corpus. Detailed methodologies were developed for designing “phonetically rich” and “prosodically ric...

متن کامل

Mandarin-English Information (MEI)

Mandarin-English Information (MEI) is one of the four projects selected for the Johns Hopkins University Summer Workshop 2000. We plan to develop technologies for using written queries to search spoken documents (cross-media) between English and Mandarin Chinese (cross-language). Our research focus is on the integration of speech recognition and machine translation technologies in the context o...

متن کامل

Statistical Analysis of Mandarin Acoustic Units and Automatic Extraction of Phonetically Rich Sentences Based Upon a very Large Chinese Text Corpus

Automatic speech recognition by computers can provide humans with the most convenient method to communicate with computers. Because the Chinese language is not alphabetic and input of Chinese characters into computers is very difficult, Mandarin speech recognition is very highly desired. Recently, high performance speech recognition systems have begun to emerge from research institutes. However...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2012